Scalable, High-Performance, and Generalized Subtree Data Anonymization Approach for Apache Spark
نویسندگان
چکیده
منابع مشابه
Scalable SDE Filtering and Inference with Apache Spark
In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative densit...
متن کاملPerformance Comparison of Apache Spark and Tez for Entity Resolution
Entity Resolution is among the hottest topics in the field of Big data. It finds duplicates in datasets, which actually belong to same entity in the real world. Algorithms that perform Entity Resolution are computation intensive and consume a lot of time especially for large datasets. A lot of research has been conducted for improving Entity Resolution solutions. A number of algorithms are deve...
متن کاملGeneralized Approach for Data Anonymization Using Map Reduce on Cloud
Data anonymization has been extensively studied and widely adopted method for privacy preserving in data publishing and sharing scenario. Data anonymization is hiding up of sensitive data for owner’s data record to avoid unidentified Risk. The privacy of an individual can be effectively preserved while some aggregate information is shared to data user for data analysis and data mining. The prop...
متن کاملScalable Anonymization Algorithms for Large Data Sets
k-Anonymity is a widely-studied mechanism for protecting identity when distributing non-aggregate personal data. This basic mechanism can also be extended to protect an individual-level sensitive attribute. Numerous algorithms have been developed in recent years for generalizing, clustering, or otherwise manipulating data to satisfy one or more anonymity requirements. However, few have consider...
متن کاملA comparison on scalability for batch big data processing on Apache Spark and Apache Flink
*Correspondence: [email protected] 1Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Calle Periodista Daniel Saucedo Aranda, 18071 Granada, Spain Full list of author information is available at the end of the article Abstract The large amounts of data have created a need for new fram...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2021
ISSN: 2079-9292
DOI: 10.3390/electronics10050589